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ABSTRACT 


■k  . 

The  rapi 


^e  rapid  development  of  computed  tomography,  ultrasound,  magnetic  resonance  imaging  arid 
other  3D  medical  imaging  modalities  has  inspired  corresponding  development  of  visualization 
methods  for  this  data.  Described  in  this  paper  are  some  of  these  methods.  Emphasized  are 
Volume  renderin^techniques  that  generate  extremely  high-quality  images  directly  from  the  3D 
data.  Also  described  are  less  computation-intensive  methods  based  on  extracted  polygonal  surface 
representations.  The  polygon-based  methods  can  already  be  used  for  interactive  visual 
exploration;  volume  renderings  will  become  interactive  with  the  next  generation  of  graphics 
computers.  We  also  briefly  describe  two  unusual  display  systems-one  based  on  a  vibrating 
varifocal  mirror,  the  other  based  on  a  head-moimt^  display-that  enhance  interarfve 
visual^ation  and  m^pulation  of  3D  medical  data.  l,  ■,  uj  Of  ^  ^ 
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New  imaging  modalities  represent  an  embarrassment  of  riches  to  the  medical  display  specialist. 
The  image  data,  from  imaging  modalities  such  as  computed  tomography  (CD,  magnetic  resonance 
imaging  (MRI),  and  ultrasound  produce  image  data  in  the  form  of  a  scalar  intensity  throughout  a 
three-dimensional  region.  This  scalar  energy  may  indicate  the  value  of  some  physical  property  of 
the  imaged  tissue  or  of  boundary  strength  within  this  physical  property.  Tj-pically,  the  data  is 
spaced  as  a  pile  of  parallel  slices  or  a  collection  of  slices  each  at  a  different  angle  through  some 
line  in  space. 

No  current  display  technique  can  effectively  transmit  all  of  this  3D  scalar  intensity  data  to  the 
clinician.  It  is  not  even  clear  what  an  ideal  display  rendering  would  look  like.  Because  we  are 
used  to  opaque  objects  in  our  everyday  world,  even  the  computer  graphics  movement  toward  ever 
greater  photo  realism  fails  to  provide  a  dependable  griide.  Even  a  stunningly  realistic  image  of 
the  patient  may  not  be  satisfactory.  The  clinician  may  wish  to  study  a  tumor  deep  inside  some 
organ  while  simultaneously  viewing  surrounding  tissue  for  orientation. 

Since  we  are  used  to  seeing  collections  of  objects,  most  with  opaque  surfaces,  many  3D  medical 
display  systems  rely  on  well-developed  polygon-based  rendering  techniques.  Polygon-based 
techniques,  however,  incur  the  serious  problem  of  requiring  a  polygon  description  to  be  extracted 
from  the  3D  data.  Extracting  a  polygonal  description  of  an  object  from  3D  image  data  consists  of 
classifying  the  parts  of  that  volume  into  object  and  non-object  and  then  defining  a  skin  of  polygons 
to  approximate  the  object  region.  Although  this  extraction  is  feasible  for  objects  that  have  easy  to 
find  boundaries  (skin,  bone),  it  is  quite  difficult  for  many  other  objects  of  interest  such  as  tumors 
and  other  soft  tissue  structures  encased  within  soft  tissue.  A  further  problem  is  that  any  binary 
decision  leads  to  false  positives  (spurious  objects)  and  false  negatives  (missing  objects).  Of  course, 
ly  intc.'prctation  based  on  only  extracted  data  will  miss  completely  any  items  whose  surfaces 
have  not  been  extracted. 
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To  avoid  these  problems,  researchers  have  begun  exploring  volume  rendering,  a  visualization 
technique  that  does  not  require  binary  classification  of  the  incoming  data.  Images  are  formed  by 
computing  a  color  and  partial  transparency  for  all  voxels  and  projecting  them  onto  the  picture 
plane.  The  lack  of  explicit  geometry  does  not  preclude  the  display  of  surfaces  as  demonstrated  by 
the  figures  in  this  paper.  The  key  improvement  offered  by  volume  rendering  is  that  it  provides  a 
mechanism  for  displaying  weak  or  fuzzy  surfaces. 

These  visualization  techniques,  however,  increase  the  needs  for  interaction;  more  parameters  now 
have  to  be  ^ — precise  viewing  positions,  regions  to  cut  away,  regions  to  highlight,  light  sources 
to  select  arid  position,  etc.  With  volume  visualization  from  original  scanned  data,  extemporaneous 
exploration  ^mises  new  understanding  of  the  original  data,  showing  subtleties  invisible  with 
coarser  tediraques.  Making  these  structures  visible,  however,  may  not  be  a  simple  task.  Even 
assuming  that  the  rendering  techniques  are  adequate,  interaction  is  necessary  to  allow  the  user  to 
remove  obscuring  portions  of  the  data  in  such  a  delicate  way  that  the  details  of  interest  are  not 
inadvertently  removed.  The  process  may  be  roughly  analogous  to  an  archaeologist  gently 
removing  dirt  to  reveal  a  delicate  fossil. 

Interaction  is  also  needed  to  decide  the  visual  interpretation  of  the  various  regions — which 
regions  to  make  totally  transparent  (invisible),  which  to  make  partially  translucent,  which 
opaque.  The  color  assigtunents  and  the  reflectivity  of  various  "surfaces"  in  the  data  also  have  to 
be  selected.  It  is  important  to  remember  that  photo  realism  cannot  be  our  only  guide  in  this  task. 
The  inside  of  the  human  body  is  mostly  opaque  and  bloody,  axxi  one  canot  see  much  of  its  structure 
from  any  single  situation,  even  while  doing  exploratory  surgery.  With  computer  rendering 
techniques  we  might  see  more  detail  (if  we  are  very  fortunate),  but  the  most  useful  images  may  not 
be  the  ones  that  look  the  most  realistic  in  the  conventioiud  sense. 

In  certain  applicatioirs,  we  are  also  trying  to  comprehend  more  than  just  anatomy  from  a  computer 
rendering.  We  want  to  add  to  the  anatomy  information  such  as  radiation  density  of  a  treatment 
plan  or  the  degree  of  blood  perfusion  that  indicates  lung  activity  levels. 

Despite  all  their  advantages,  volume  rendering  methods  are  too  computationally  expensive  for 
today's  computers;  polygon-based  methods  are  still  the  method  of  choice  for  many  applications 
that  require  user  interaction.  We  describe  in  following  sections  some  software  and  hardware 
systems  that  we  use  to  achieve  interactive  polygon-based  3D  medical  display. 

We  also  describe  in  this  paper  a  pair  of  spedal  display  devices  that  ameliorate  certain 
difficulties',  the  varifocal  mirror  display  may  allow  immedicate,  real-time  "true"  3D 
visualization  of  individual  data  points  (such  as  those  from  real-time  ultrasound  devices);  the 
head-tracked  head-mounted  display  will  allow  "true"  3D  display  superimposed  on  the  patient 
during  ultrasound  acquisition. 


RENDERING  METHODS 
Voliune  Rendering 

Recently  we  have  been  concentrating  on  volume  rendering,  a  visualization  technique  in  which  a 
color  and  an  opacity  are  assigned  to  each  voxel,  and  a  2D  projection  of  the  resulting  colored  semi¬ 
transparent  gel  is  computed  (Levoy  1988a,  Drebin  1988,  Sabella  1988,  Upson  1988].  The  principal 
advantages  of  volume  rendering  over  other  visualization  techniques  are  its  superior  image  quality 
and  its  ability  to  generate  images  vrithout  explicitly  defining  suriaces.  Our  recent  efforts  in  this 
area  have  addressed  some  of  the  drawbacks  of  volume  rendering,  including  high  rendering  cost  and 
the  difficulty  of  mixing  analytically  defined  geometry  and  volumetric  data  in  a  single 
visualization. 
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Reducing  the  Cost  of  Volume  Rendering 

Since  all  voxels  participate  in  the  generation  of  each  image,  rendering  time  grows  linearly  with 
the  size  of  the  dataset.  This  cost  can  be  reduced,  however,  by  taking  advantage  of  various  forms  of 
coherence.  Three  such  optimizations  are  summarized  here  and  described  in  detail  in  [Levoy  1988b, 
Levoy  1989]. 

The  first  optimization  is  based  on  the  observation  that  many  datasets  contain  coherent  regions  of 
iminteresting  voxels.  A  voxel  is  defined  as  uninteresting  if  its  opacity  is  zero.  Methods  for 
encoding  coherence  in  volume  data  include  octrees  (Meagher  1982],  polygonal  representations  of 
bounding  surfaces  [Pizer  1986],  and  octree  representations  of  bounding  surfaces  [Gargantini  1986]. 
Methods  for  taking  advantage  of  coherence  during  ray  tracing  include  cell  decompositions  (also 
known  as  boimding  volumes)  [Rubin  1980]  and  spatial  occupancy  enumerations  (alM  known  as  space 
subdivisions)  [Glassner  1984].  In  our  work,  we  employ  a  hierarchical  enumeration  represented  by  a 
pyramid  of  binary  volumes.  The  pyramid  is  used  to  efficiently  compute  intersections  between 
\’icwing  rays  and  regions  of  interest  in  the  data. 

The  second  optimization  is  based  on  the  observation  that  once  a  ray  has  struck  an  opaque  object  or 
has  progress^  a  sufficient  distance  through  a  semi-transparent  object,  opacity  accumulates  to  a 
level  that  the  color  of  the  ray  stabilizes  and  ray  tracing  can  be  stopped.  The  idea  of  adaptively 
terminating  ray  tracing  was  first  proposed  in  (Whilted  1980].  Many  algorithms  for  displaying 
medical  data  stop  after  encountering  the  first  surface  or  the  first  opaque  voxel.  In  this  guise,  the 
idea  has  been  reported  by  numerous  researchers  (Goldwasser  1986,  Tiede  1988,  Schlusselberg  1986, 
Trousset  1987].  In  volume  rendering,  surfaces  are  not  explicitly  detected.  Instead,  they  arise  in  the 
form  of  a  surface  likelihood  level  and  appear  in  the  image  as  a  natural  by-product  of  the  stepwise 
accumulation  of  color  and  opacity  along  each  ray.  We  implement  adaptive  termination  of  ray 
tracing  by  stopping  when  opacity  reaches  a  user-selected  threshold  level. 

If  there  is  coherence  present  in  a  dataset,  there  may  also  be  coherence  present  in  its  projections. 
This  is  particularly  true  for  data  acquired  from  sensing  devices,  where  the  acquisition  process 
often  introduces  considerable  blurring.  The  third  optimization  takes  advantage  of  this  coherence 
by  casting  a  sparse  grid  of  rays,  less  than  one  per  pixel,  and  adaptively  increasing  the  number  of 
rays  in  regions  of  high  image  complexity.  In  classical  ray  tracing,  methods  for  distributing  rays 
nonuruformly  include  recursive  subdivision  of  image  space  (Whitted  1980]  and  stochastic  sampling 
[Lee  1985,  Dippe  1985,  Cook  1986,  Kajiya  1986].  Methods  for  measuring  local  image  complexity 
include  color  differences  [Whitted  1980]  and  statistical  variance  [Lee  1985,  Kajiya  1986].  We 
employ  recursive  subdivision  based  on  local  color  differences.  The  approach  is  similar  to  that 
described  by  Whitted,  but  extended  to  allow  sampling  densities  of  less  than  one  ray  per  pixel. 
Images  are  formed  from  the  resulting  nonuniform  array  of  samjrfe=eelors  by  interpolation  and 
resampling  at  the  display  resolution. 

Combining  these  three  optimizations,  savings  of  more  than  two  orders  of  nugnitude  over  brute- 
force  rendering  methods  have  been  obtained  for  many  datasets.  Alternatively,  the  adaptive 
sampling  method  allows  a  sequence  of  successively  more  refined  images  to  be  generated  at  evenly 
spaced  intervals  of  time  by  casting  more  rays,  adding  the  resulting  colors  to  the  sample  array,  and 
repeating  the  interpolation  and  resampling  steps.  Crade  images  can  often  be  obtained  in  a  few 
seconds,  followed  by  gradually  better  images  at  intervals  of  a  few  seconds  each,  culnunating  in  a 
high-quality  image  in  less  than  a  minute. 

Mixing  Geometric  and  Volumetric  Data 

Let  us  now  examin"  Ihc  ^coV.err,  of  mixing  geometric  and  volumetric  data  in  a  single  visualization. 
Clinical  applications  include  superimposition  of  radiation  treatment  beams  over  patient  anatomy 
for  the  oncologist  and  display  of  medical  prostheses  for  the  orthopedist.  We  have  decided  to 
restrict  ourselves  to  meth^s  capable  of  handling  semi-transparent  polygons.  This  constraint 
eliminates  2-1 /2D  schemes  such  as  image  compositing  (Porter  1984]  and  depth-enhanced 


compositing  (Duff  19851,  although  such  techniques  can  produce  useful  visualizations  as 
demonstrated  by  [Goodsell  1988].  We  summarize  here  two  methods  for  rendering  these  mixtures. 
More  detailed  descriptions  are  given  in  a  separate  technical  report  [Levoy  1988cl. 


The  first  method  employs  a  hybrid  ray  tracer  (Fig.  1).  Since  its  introduction,  ray  tracing  [Whitted 
1980]  has  been  extended  to  handle  irore  types  of  objects  than  possibly  any  other  rendering  method. 
Its  applicability  to  scalar  fields  has  been  demonstrated  by  numerous  researchers  [Kajiya  1984, 
Levoy  1988a,  ^bella  1988,  Upson  1988].  The  idea  of  a  hybrid  ray  tracer  that  handles  both  scalar 
fields  and  polygons  has  been  proposed  many  times,  but  no  implementation  of  it  has  yet  been 
reported.  In  our  method,  rays  are  cast  through  the  ensemble  of  volumetric  and  geometric  data,  and 
samples  of  each  are  drawn  and  composited  in  depth-sorted  order.  To  avoid  errors  in  visibility, 
volumetric  samples  lying  inunediately  in  front  of  and  behind  polygons  require  special  treatment. 
To  avoid  aliasing  of  polygonal  edges,  adaptive  supersampling  is  employed.  Geometric  and 
volumetric  data  exhibit  qualitatively  different  frequency  spectra,  however,  so  care  must  be  taken 
when  distributing  rays.  These  issues  are  addressed  in  detail  in  the  referenced  technical  report. 

The  second  method  we  have  developed  involves  3D  scan-conversion,  an  extension  into  three 
dimensions  of  the  more  commonly  used  2D  technique.  Formally,  3D  scan-conversion  transforms  a 
solid  object  from  a  boundary  representation  into  a  spatial  occupancy  enumeration.  By  treating 
surfaces  as  infinitel}'  thin  solids  and  making  certain  assumptions  about  the  transformation  process, 
they  may  be  handl^  as  well.  Efficient  algorithms  exist  for  3D  scan-conversion  of  polygons 
[Kaufman  1987b].  polyhedra  [Kaufman  1986],  and  cubic  parametric  curves,  surfaces,  and  volumes 
[Kaufman  1987a].  In  all  of  these  cases,  a  binary  voxel  representation  is  used,  resulting  in  aliasing 
in  the  generated  images.  To  avoid  these  artifacts,  the  object  must  be  bandlimited  to  the  Nyquist 
frequency  in  ail  three  dimensions,  then  sampled  in  a  manner  that  limits  losses  due  to  quantization. 
The  ability  of  volume  rendering  to  represent  partial  opacity  makes  it  suitable  for  this  task.  Like 
the  hybrid  ray  tracer,  this  idea  has  been  suggested  before  but  not  published.  In  our  method, 
polygons  are  shaded,  filtered,  sampled,  combined  with  volumetric  data,  and  the  composite 
dataset  is  rendered  using  published  techniques.  No  particular  care  need  be  taken  in  the  vicinity  of 
sampled  geometry,  and  no  supersampling  is  required.  If  polygons  are  sufficiently  bandlimited 
prior  to  sampling,  this  approach  produces  images  free  from  aliasing  artifacts. 

Adding  Shadows  and  Textures 

To  compare  the  relative  versatility  of  these  two  rendering  methods,  we  have  also  developed 
methods  for  adding  shadows  and  textures  (Figs.  2  &  3).  Max  has  written  a  brief  but  excellent 
survey  of  algorithms  for  casting  shadows  [Max  1986].  We  employ  a  two-pass  approach  [Williams 
1978],  but  store  shadow  information  in  a  3D  light  strength  buffer  instead  of  a  2D  shadow  depth 
buffer.  The  amount  of  memory  required  for  a  3D  buffo-  is  obviously  much  greater,  but  the 
representation  has  several  advantages.  By  computing  a  fractional  light  strength  at  every  point  in 
space,  penumbras  and  shadows  cast  by  semi-transparent  objects  are  correctly  rendered.  Moreover, 
the  shadow  aliasing  problem  encountered  by  Williams  does  not  occur.  Finally,  our  algorithm 
correctly  handles  shadows  cast  by  volumetrically  defined  objects  on  themselves,  as  well  as 
shadows  casi  by  polygons  on  volumetric  ol^ects  and  vice  versa. 

Wrapping  textures  around  volumetrically  defined  objects  requires  knowing  where  their  defining 
surfaces  lie — a  hard  problem.  Projecting  textures  through  space  and  onto  these  surfaces  is  much 
easier  and  can  be  handled  by  a  straightforward  extension  of  the  shadow-casting  algorithm. 
Mapping  textures  onto  polygons  embedded  in  volumetric  datasets  is  also  relatively  simple  and 
readily  added  to  the  hybrid  ray  tracer  already  described.  A  good  survc}'  of  texture  mapping 
algorithms  has  been  written  by  Heckbert  [1986].  We  employ  a  method  similar  to  that  described  by 
Feibush  [1980],  but  since  geometric,  volumetric,  and  texture  data  each  exhibit  different  spectra, 
care  must  be  taken  when  mixing  them.  A  detailed  description  of  our  texturing  and  shadowing 
algorithms  is  contained  in  the  technical  report  already  referred  to  [Levoy  1988c]. 


Figure  1:  Volume  rendering  of  same  CT  study  as  figure  3,  showing  both  bone  and  soft  tissue.  A 
polygorully  defined  tumor  (in  purple)  and  radiation  treatment  beam  (in  blue)  have  been  added 
using  our  hybrid  ray  tracer.  A  portion  of  the  CT  data  has  been  clipped  away  to  show  the  3D 
relationships  between  the  various  objects. 


Figure  2;  Volume  rendering  of  same  CT  study  as  figure  3.  Three  polvgons  have  been  embedded  in 
the  study  using  our  hybrid  ray  tracer,  and  a  texture  image  has  been  mapped  onto  each  polygon. 
Although  a  whimsical  texture  has  been  used  here,  the  technique  could  be  used  to  display 
measurement  grids  or  secondarj'  datasets. 


Figtire  3;  Volume  rendering  of  a  256  x  256  x  113  voxel  CT  study  of  a  human  head.  Five  analytically 
defined  slabs  have  been  embedded  in  the  study  using  our  3D  scan-conversion  algorithm,  and 
shadows  have  been  cast  from  an  imaginary  light  source.  Initial  light  strengths  were  assigned  from 
a  texture  containing  a  filtered  rectangular  grid,  effectively  projecting  this  texture  through  the 
dataset  and  onto  all  illuminated  surfaces.  This  technique  might  be  used  to  identify  tissue  surfaces 
irradiated  by  a  radiation  treatment  beam  originating  from  a  specified  direction.  Shadow  masks 
placed  in  front  of  the  beam  would  allow  custom  field  shapes,  reticles  and  crosshairs  to  be  projected 
onto  anatomical  structures. 


Editing  volume  data 

For  some  renderings  we  first  edit  the  original  3D  dataset  to  select  a  subset  of  the  region  to  be 
rendered.  Figure  4  shows  an  early  attempt  at  one  such  rendering.  The  thin  white  curved  line  in 
the  large  image  of  Figure  5  indicates  the  region  in  one  2D  slice  image  from  which  Figure  4  was 
rendered.  A  view  of  the  screen  from  our  new  program,  IMEX,  for  editing  image  data  on  multi¬ 
window  workstation  displays  is  shown  in  Figure  6  (Mills  ct  al.  1989]. 


Figure  4;  Rendering  of  a  63  x  256  x  256-pixcl  abdominal  region  indicated  in  Fig.  5.  Horizontal 
striations  on  the  kidneys,  etc.,  are  due  to  patient  breathing  movement  during  CT  acquisition. 


Figure  5'.  Sample  CT  slice  images  with  each  slice's  region  to  be  rendered  in  Fig.  4  indicated  by  a 
thin  white  line.  Notice  that  the  line  does  not  define  the  surface  of  the  individual  objects. 
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Figure  6:  Screen  view  of  new  interactive  IMEX  (IMage  Executive)  program  used  for  viewing 
images  at  various  magnification  levels,  intensity  windowing,  defining  regions  of  interest,  and 
defming  object  contours. 


Real-time  volume  rendering 

Pixel-Planes  4,  a  raster  graphics  engine  for  high-speed  rendering  of  3D  objects  and  scenes,  has  been 
running  in  nur  laboratory  since  the  late  summer  of  1986  (Fylcs  et  a!.,  1988,  Fuchs  et  al.,  1988].  Under 
joint  funding  by  DARPA  and  the  NSF,  we  are  currently  developing  Pixel-Planes  5,  a  new 
generation  that  promises  to  have  unprecedented  power  and  flexibility.  It  will  consist  of  32  10  to 
20-MFLOP  graphics  processors,  1/4  million  pixel  processors,  al024  x  768-pixcl  color  frame  buffer, 
and  a  5  Gbit/ sec  ring  network.  We  expect  the  machine  to  become  operational  sometime  during  the 
summer  or  fall  of  1989.  ^ 

Although  Pixel-Planes  5  was  not  explicitly  designed  for  volume  rendering,  its  flexibility  makes  it 
surprising  well  suited  to  the  task.  Briefly,  we  plan  to  store  the  function  value  and  gradient  for 
several  voxels  in  the  backing  store  of  each  pixel  processor.  The  processor  would  then  perform  the 
classification  and  shading  calculations  for  all  voxels  in  its  backing  store.  The  time  to  apply  a 
monochrome  Phong  shading  model  at  a  single  voxel  using  a  pixel  processor  is  about  1  msec.  For  a 
256  X  256  x  256  voxel  dataset,  each  pixel  processor  would  be  assigned  64  voxels,  so  the  time  reqtiired 
to  classify  and  shade  the  entire  dataset  would  be  about  64  msec.  The  tracing  of  rays  to  generate  an 
image  will  be  done  by  the  graphics  processors.  Each  processor  will  be  assigned  a  set  of  rays.  They 
will  request  sets  of  voxels  from  the  pixel  processors  as  necessary,  perform  the  tri-linear 
interpolation  and  compositing  operations,  and  transmit  the  resulting  pixel  colors  to  the  frame 
buffer  for  display. 

The  success  of  this  approach  depends  on  reducing  the  number  of  voxels  flowing  from  the  pixel 
processors  to  the  graphics  processors.  Three  strategies  are  planned.  First,  the  pyramid  of  binary 
volumes  described  in  [Levoy  1988b]  will  be  installed  in  each  graphics  processor.  This  data 
structure  encodes  the  coherence  present  in  the  dataset,  telling  the  graphics  processor  which  voxels 
are  interesting  (non-transparent)  and  hence  worth  requesting  from  the  pixel  processors.  Second, 
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the  adaptive  sampling  scheme  described  in  [Levoy  1989]  will  be  used  to  reduce  the  number  of  rays 
required  to  generate  an  initial  image.  Last,  all  voxels  received  by  a  graphics  processor  will  be 
retained  in  a  local  cache.  If  the  observer  does  not  move  during  generation  of  tht  initial  image,  the 
cached  voxels  will  be  used  to  drive  successive  refinement  of  the  image.  If  the  observer  moves, 
many  of  the  voxels  required  to  generate  the  next  frame  may  already  reside  in  the  cache,  depjending 
oil  how  far  the  observer  moves  between  frames. 

The  frame  rate  we  expect  from  this  :,yslem  depends  on  what  parameters  change  from  frame  to 
frame.  Preliminary  estimates  suggest  that  for  changes  in  observer  position  alone,  we  will  be  able 
to  generate  a  sequence  of  slightly  coarse  inaages  at  10  frames  per  second  and  a  sequence  of  images  of 
the  quality  of  figure  3  at  1  frame  per  second.  For  changes  in  shading,  or  changes  in  classification 
that  do  not  invalidate  the  hierarchical  enumeration,  we  expect  to  obtain  about  20  coarse  or  2 
high-quality  images  per  second.  This  includes  highlighting  and  interactively  moving  a  region  of 
interest,  which  we  plan  to  implement  by  heightening  the  opacity  of  voxels  inside  in  the  region 
and  attenuating  the  opiacities  of  voxels  outside  the  region.  If  the  user  changes  the  classification 
mapping  in  such  a  way  as  to  alter  the  set  of  interesting  voxels,  the  hierarchical  enumeration  must 
be  recomputed.  We  expect  this  operation  to  take  several  seconds. 


INTERACTIVE  CINE  SEQUENCES 

We  regularly  use  precalculated  cine  sequences  to  inaease  comprehension  of  complex  3D  structures 
whose  renderings  each  take  minutes  to  calculate.  We  allow  user-control  of  the  image  selection 
from  the  precalculated  sequence  so  that  the  images  can  be  made  to  rock  back  and  forth,  for 
instance,  or  to  "cut"  through  the  volume  along  locations  of  particular  interest.  We  store  the 
precalculated  sequence  either  in  the  image  memory  of  a  Pixar  Image  Computer’'**  or  an  Adage  (n6e 
Ikonas)  RDS  3000™.  Our  physician  colleagues  find  particularly  useful,  sequences  whose 
individual  images  vary  oidy  slightly  in  the  position  of  a  hither  clipping  plane;  such  a  sequence 
from  the  side  of  the  head,  for  example,  allows  them  to  study  in  detail  tiny  complex  3D  structures 
of  the  middle  and  irmer  ear.  Any  single  image  from  such  a  sequence  is  difficult  to  understand,  even 
for  a  specialist,  but  user  control  of  a  (moving)  sequence  significantly  aids  the  user’s  comprehension. 

When  calculating  a  sequence  of  images,  we  usually  vary  only  a  single  parameter,  such  as  the  angle 
of  rotation  about  the  vertical  axis  or  the  position  of  a  cutting  plane,  in  order  to  maximize  the  user’s 
intuition  for  controlling  interactive  playback  of  the  sequence.  The  user,  during  playback,  would 
often  like  to  vary  multiple  parameters  independently  (rotation  and  cutting  plane  position). 
Unfortunately,  for  that  capability,  the  numter  of  precalculated  and  stored  images  is  the  product 
of  the  numbers  of  steps  in  the  variation  of  each  parameter  — a  modest  20  steps  for  each  of  two 
variables  requires  the  calculation  and  storage  of  400  images.  Our  current  image  n^moiy  capacity 
of  64  512  X  512  8-bit  images  seriously  linuts  the  extent  to  which  we  can  independently  vary 
multiple  parameters  during  playback. 

POLYGON  RENDERINGS 

Unfortunately  we  still  cannot  generate  high-quality  volume  renderings  at  interactive  rates.  We 
therefore  continue  to  develop  rendering  te^niques  and  systems  based  on  polygonal 
representations.  Also,  of  course,  for  some  applications  polygonal  representations  are  more  natural 
than  volumetric  ones. 

We  incur  two  serious  disadvantages  using  polygons  for  visualizing  3D  medical  data.  First,  in  many 
situations  the  surface  of  the  object  of  interest  cannot  be  defined  automatically;  this  happens 
esjjecially  with  soft  tissue  structures  (such  as  brain  tumors  and  abdominal  organs)  that  themselves 
lie  within  soft  tissue  regions.  Second,  the  images  from  a  polygon  dataset  cannot  show  many  of  the 
subtleties  of  the  original  3D  data  that  may  be  important  to  the  user,  for  much  of  the  information  is 
not  embodied  in  explicit  surfaces.  Nevertheless,  the  rapid  interaction  that  polygon 
representations  make  possible  allows  interactive,  extemporaneous  exploration  of  the  3D  structures 
that  is  infeasible  with  any  precalculated  sequence.  Figure  7  shows  one  such  image.  The  user 
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typically  manipulates  the  3D  structure  with  dual  3-axis  joysticks  that  allow  "zooming  in"  to  any 
part  of  the  3D  volume.  For  example,  our  radiation  therapy  users  often  want  to  understand  the 
spatial  relationships  between  the  radiation  isodose  surfaces  and  some  healthy  organ  boundaries 
and  will  maneuver  the  viewing  position  to  be  inside  the  anatomy  in  order  to  understand  the 
detailed  local  structure. 

For  interactive  display  of  polygon  data,  we  have  been  using  our  locally  developed  Pixel-Planes  4 
system  [Eyles  1988].  Although  it  is  more  than  two  years  old  now  and  its  sp»ocd  is  no  longer 
trailblazing  (some  25,000  individual  triangles  or  40,000  friangle-in-strips  per  second),  it  is  still 
one  of  the  fastest  existing  graphics  engines  for  these  3D  medical  applications.  In  these 
applications,  the  viewing  position  is  often  inside  a  3D  polygonal  structure,  so  most  pixels  may  be 
painted  by  multiple  polygons,  and  many  polygons,  being  close  to  the  viewer,  are  quite  large. 
Objects  like  the  one  shown  in  Figure  7  can  be  manipulated  at  about  5  frames  per  second. 


Figure  7:  Interactive  visualization  on  Pixel-Planes  4  of  a  female  pelvis  with  vaginal  inserts  for 
radiation  sources.  Long  bulbous  object  surroundmg  the  tip  of  the  three  vertical  stafts  is  a  radiation 
isodose  surface. 


We  have  been  working  with  colleagues  in  UNC  Departments  of  Radiology  and  Biomedical 
Engineering  in  the  visualization  of  two  superimpos^  but  distinct  3D  data  sets  containing 
anatomical  and  physiological  data  of  the  human  lung.  The  anatomical  data  is  from  conventional 
transmission  CT,  conventional  except  that  the  radiation  source  is  a  flood  field  of  g-rays;  the 
physiological  data  is  gathered  by  emission  CT  techniques  and  shows  the  degree  of  lung  perfusion 
by  radioactive  trace  material.  In  this  application,  to  avoid  confusion  and  to  reduce  rendering 
time,  Pixel-Planes  4  displays  (at  interactive  rates)  the  physiology  data  only  within  the  known 
region  of  interest — inside  the  lung.  In  particular,  it  displays  the  physiology  data  within  the  lung 
as  grey-scale  only  on  the  surface  exposed  by  a  user-controlled  cutting  plane.  The  user  is  free  to 
move  the  viewing  position  anywhere  about  the  chest  and  lungs  and  can  select  independently  the 
horizontal  cutting  plane  (Figure  8). 


Figure  8;  Interactive  display  on  Pixel-Planes  4  of  lung  anatomy  with  lung  perfusion  data  showing 
in  gray-scalc  on  cut-surface  of  lungs. 


UNUSUAL  3D  DISPLAY  HARDWARE 
Stereo  Plate 

The  simplest,  most  common  enhancement  for  3D  display  is  some  kind  of  stereo  viewer.  We  have 
been  using  a  variety  of  these  for  many  years.  Our  current  favorite  is  a  polarizing  liquid  crystal 
plate  manufactured  by  Tektronix.  The  plate  is  mounted  on  the  front  of  the  video  display;  the 
plate's  direction  of  pwlarization  is  electronically  switched  at  the  field  rate.  The  viewer  wears 
inexpensive  passive  glasses  with  polarizing  material  of  different  direction  in  each  lens.  With 
this  system  multiple  viewers  can  see  the  same  stereo  image,  and  each  can  look  about  at  multiple 
video  displays. 

Varifocal  Mirror  Display 

This  unusual  device  we  and  others  have  been  developing  provides  "true"  3D  perception  (head- 
motion  parallax  and  stercopsis)  of  a  3D  intensity  distribution  [Mills  et  al.  19^).  The  display  can 
be  viewed  by  several  observers  simultaneously  without  their  needing  to  wear  any  special 
apparatus.  Our  display  consists  of  a  vibrating  aluminized  mylar  membrane  stretched  on  a 
drumhead-like  structure  and  acoustically  vibrated  from  behind  by  a  large  conventional  sp>eaker 
driven  by  a  low  frequency  (30Hz)  sine  wave.  Viewers  looking  into  this  vibrating  (and  "varifocal") 
mirror  see  the  image  of  a  simple  point-plotting  CRT  on  which  a  sequence  of  dots  is  rapidly 
displayed.  The  displayed  list  of  dots  is  repeated  at  the  30Hz  rate,  synchronized  to  the  mirror 
vibration.  Each  dot  is  thus  perceived  at  a  particular  depth,  depending  on  the  position  of  the 
mirror  when  the  dot  is  being  displayed.  Some  100,000  to  250,000  3D  points  can  easily  be  displayed 
directly  from  a  conventional  color  frame  buffer.  Interactive  variatic^n  of  viewing  posidon,  object 
selection,  scale,  and  clipping  planes  has  been  found  to  be  quite  important  and  is  provided  directly 
by  the  processor  in  a  modern  raster  graphic,s  system. 


The  working  volume  is  limited  by  the  size  of  the  display  and  the  deflection  of  the  nainor 
membrane.  Viewers  can  move  ateut  the  display,  limited  only  by  the  positions  from  which  the 
CRT  face  is  visible  in  the  mirror.  The  display  is  particularly  well-suited  for  the  immediate 
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display  of  3D  intensity  data  points,  equally  those  giving  surface  likelihood,  such  as  might  come 
from  an  ultrasound  scanner,  since  no  significant  computation  is  necessary  in  the  display;  each  3D 
point  is  immediately  stored  in  the  appropriate  location  in  the  display  list  memory. 

(Unfortunately  for  the  reader  of  a  scientific  paper,  any  2D  photograph  of  a  complex  3D  image  on 
such  a  display  apjeears  as  a  confusing  cluster  of  points  since  the  "true "  3D  clues  of  stereopsis  and 
head-motion  parallax  are  absent.) 

Head-Mounted  Display 

For  about  a  decade,  there  has  been  head-mounted  display  research  in  our  department  [Chung  et 
al.  1989].  This  is  an  unusual  type  of  computer  graphics  display  introduced  by  [Sutherland  1968]  in 
which  the  real-time  image  generated  onto  the  display  (Figure  9)  is  changed  as  a  function  of  the 
user's  head  position,  to  give  the  user  the  illusion  of  walking  about  a  simulated  object  or  even  of 
walking  inside  a  simulated  3D  environment.  These  systems  currently  suffer  from  several 
weaknesses:  graphics  systems  that  can't  generate  new  images  at  30Hz,  poor  resolution  of  the  small 
video  displays,  inaccurate  high  latency  and  limited-range  tracking  systems.  Nonetheless,  these 
systems  hold  great  promise.  They  may  give  the  user  a  dramatically  stronger  feeling  for  where  he 
is  located  at  all  times  with  respect  to  the  object  of  interest  and  may  give  stronger  comprehension  of 
3r*  through  head-motion  parallax  (in  addition  to  stereopsis).  For  comprehension  of  3D  volume 
data  in  particular,  these  head-mounted  displays  may  also  allow  (with  the  addition  of  a  hand¬ 
held  positioning  device)  simple  hand-directed  "erasure"  of  uninteresting  but  confusing  parts  of  the 
volume,  or  alternatively,  highlighting  of  regions  of  particular  interest.  The  use  of  this  modality 
for  exploring  the  possibilities  of  radiotherapy  treatment  beams  positions  by  a  continuously 
controllable,  true  "beam's  eye  view"  is  under  investigation  in  our  laboratory. 


Figure  9:  UNC's  latest  head-gear  for  the  head-mounted  display  system.  Twin  color  video  LCDs 
are  mounted  on  the  bottom  of  the  visor.  The  wearer  looks  through  half-silvered  mirrors  to  see  in 
stereo  the  simulated  objects  superimposed  on  his  physical  environment.  (Photo  by  Mark  Harris) 
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FUTURE  WORK 

We  and  and  other  colleagues  in  our  department  are  developing  our  next  generation  graphics 
system,  Pixel-Planes  5  [Fuchs  et  al.  19^1.  This  system,  with  about  20  times  the  processing  power  of 
its  predecessor,  may  also  be  a  good  base  on  which  to  develop  interactive  volume-rendering 
algorithms.  Our  preliminary  investigations  indicate  that  crude  512  x  512  images  from  256  x  256  x 
256-voxel  arrays  will  be  generated  in  about  100  milliseconds,  and  such  an  innage  will  be 
progressively  refined  for  up  to  1  second  to  get  (our  current)  highest  quality  inuge. 
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